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Abstract  —  This  paper  presents  theoretical  results  on 
instability  processes  in  nanoscale  tunneling  structures  that 
were  obtained  from  a  computationally  improved  physics- 
based  simulator.  The  results  were  obtained  from  a 
numerical  implementation  of  the  Wigner-Poisson  electron 
transport  model  upon  a  parallel-computing  platform.  These 
investigations  considered  various  forms  of  multi-barrier 
resonant  tunneling  structures  (RTSs)  and  they  were  used  to 
test  the  robustness  of  the  new  modeling  code.  This  improved 
modeling  tool  is  shown  to  be  fast  and  efficient  with  the 
potential  to  facilitate  complete  and  rigorous  studies  of  this 
time-dependent  phenomenon.  This  is  important  because  it 
will  allow  for  the  study  of  RTSs  embedded  in  realistic 
circuit  configurations.  Hence,  this  advanced  simulation  tool 
will  allow  for  the  detailed  study  of  RTS  devices  coupled  to 
circuits  where  numerical  simulations  in  time  and  iterative 
numerical  optimization  over  the  circuit  parameters  are 
required.  Therefore,  this  work  will  enable  the  future  study 
of  RTS-based  circuits  operating  at  very  high  frequencies. 

I.  Introduction 

The  accurate  study  of  instability  processes  in  nanoscale 
tunneling  structures  presents  new  and  formidable 
theoretical  challenges.  A  complete  and  rigorous  study  of 
electronic  instabilities  in  nanostructures  requires  a 
detailed  consideration  of  time-dependent  quantum 
mechanical  effects  and  this  leads  to  computationally 
intensive  numerical  simulation.  These  investigations 
utilize  a  Wigner-Poisson  model  to  study  electron 
transport  and  intrinsic  oscillations  within  double-  and 
triple-barrier  resonant  tunneling  structures  (RTSs). 
Studies  of  time-dependent  processes  in  nanostructures  are 
important  because  it  is  believed  that  if  the  dynamics 
surrounding  intrinsic  oscillations  can  be  understood  and 
controlled  then  resonant  tunneling  structures  have  the 
potential  to  supply  significant  levels  of  output  power  at 
very  high  frequencies  [1],  Here,  an  advanced  and  fast 


numerical  algorithm  is  developed  and  implemented  on  a 
parallel-computing  platform  to  facilitate  these  time- 
dependent  investigations. 

This  paper  presents  the  details  of  that  numerical 
algorithm  which  allows  for  scientific  investigations  of  the 
underlying  origins  of  the  quantum-based  fluctuations. 
This  fast  solver  is  based  on  a  complete  restructuring  of  an 
original  Wigner-Poisson  simulator  that  was  the  first  to 
theoretically  demonstration  intrinsic  oscillations  within 
resonant  tunneling  diodes.  This  improved  solver  also 
employs  a  new  higher-order  Runge-Kutta  method  and 
utilizes  efficient  programming  constructs  that  encourage 
parallelization  and  the  effective  use  of  modern  multi-level 
caches.  This  simulator  allows  for  detailed  investigations 
of  the  complex  quantum-mechanical  coupling  within  the 
multi-quantum-well  systems.  These  scientific  studies 
reveal  fundamental  insight  into  new  methods  for 
generating  and  enhancing  intrinsic  oscillations  within 
multi-barrier  quantum-well  systems.  Most  importantly, 
the  parallel-platform-based  numerical  simulator 
developed  here  will  allow  for  the  efficient  study  and 
advanced  design  of  novel  nanostructures  that  have  the 
potential  for  functioning  as  very  high  frequency  electronic 
sources.  Here,  practical  design  studies  will  require  the 
detailed  analysis  of  RTSs  that  are  embedded  within 
optimized  circuit  configurations.  Hence,  numerical 
optimization  iterations  are  required  for  each  time-domain 
simulation  and  this  mandates  the  need  for  a  fast  and 
efficient  electron  transport  simulator. 

n.  Wigner-Poisson  Electron  Transport  Model 

The  Wigner  function  formulation  of  quantum  mechanics 
was  selected  for  these  investigations  into  RTSs  because  of 
its  many  useful  characteristics  for  the  simulation  of 
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quantum-effect  electronic  devices,  including  the  natural 
ability  to  handle  dissipate  and  open-boundary  systems. 
The  Wigner  function  can  be  combined  with  the  Poisson 
equation  to  provide  for  an  adequate  quantum  mechanical 
description  of  the  electron  transport  through  tunneling 
nanostructures.  The  Wigner-Poisson  (WP)  model  has 
been  used  by  many  groups  in  the  past  [2]  and  we  have 
applied  it  to  isolated  RTSs  to  provide  a  qualitative 
explanation  for  the  origins  of  the  intrinsic  oscillation  [3] 
and  to  reveal  techniques  for  enhancing  the  effect  [4].  The 
focus  of  this  work  is  towards  improving  the 
computational  aspects  related  to  numerically  solving  the 
WP  model  systems  equations  subject  to  the  necessary  and 
sufficient  boundary  conditions.  Details  regarding  the 
derivation  can  be  found  elsewhere  [2],  but  the  model  is  a 
two  equation  system  with  the  basic  mathematical  form 
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dt 
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where  jU0  and  LI  L  are  the  known  Fermi  energies  at  the 
source  and  collector,  respectfully.  The  total  potential 
energy  function  of  the  structure  is  given  by 

U(x)  =  w(x)  +  Ac(x)  (6) 

where  Af(x)  is  the  band  offset  function  the  defines  the 
barriers  and  wells  within  the  RTS.  The  total  potential 
energy  is  dependent  on  the  Wigner  function  through 
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where  the  last  term  in  Eq.  ( 1 )  is  given  by 

W{f)  =  C1k^-K0{f)  +  \ ^ 
a  x  at 


which  defines  the  electron  density  function  n(x).  Once  the 
electron  density  profile  is  defined  u(x)  can  be  determined 
by  solving  the  Poisson  equation  in  Eq.  (2)  using  a 
(3)  specified  doping  profile  function  N d  (x)  and  the  applied 
bias  boundary  conditions 


with  the  physical  constant  Cx  =  —  h/\27T m*)  and  the 
integral  expression  defined  by 
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where  L  is  the  length  of  the  tunneling  structure  under 
consideration.  The  last  term  in  Eq.  (3)  is  due  to  scattering 
dissipation  and  is  modeled  using  the  relaxation  time 
approximation  [1],  The  boundary  conditions  on 
/ (x,  A:’)  at  the  emitter  ( x  =  0  )  and  collector  ( x  =  L  )  are 
specified  to  approximate  flat-band  transport.  Here, 
equilibrium  electron-distribution  conditions  are 
prescribed  for  values  of  the  Wigner  function  at  x  =  0  that 
correspond  to  injection  from  the  left  (i.e.,  k  >  0  )  and  at 
x  =  L  that  correspond  to  injection  from  the  right  (i.e., 
k  <  0  )  according  to  the  mathematical  relations  [1] 


7/(0)  =  0,  and  u(L)  =  -V bias .  (8) 

Finally,  the  electron  current  density  through  the  RTS  is 
given  by  the  relation 

j(x)  =  ^-ldk^-f(x,k)  (9) 

2  TZ  J  m* 

where  m*  is  the  free  electron  mass. 

III.  Parallel  Code  Development 
A.  Basic  Goals 

Many  structural  changes  to  the  original  code  from  [5] 
were  needed  to  enable  parallelism,  conserve  storage,  and 
allow  for  the  use  of  modern  algorithms  for  temporal 
integration  and  solution  of  linear  and  nonlinear 
equations.  The  structural  changes  included. 


•  the  reordering  of  loops  to  enhance  locality  of 
references  to  memory, 

•  the  elimination  of  banded  matrix  storage  to 
conserve  memory  and  facilitate  distributed 
memory  computation,  and 

•  the  replacement  of  basic  vector  and  matrix-vector 
loops  with  calls  to  hand-tuned  computational 
kernels  [6], 

B.  Development  Steps  &  Details 

The  parallelization  of  the  basic  code  has  been  done  on 
a  four-processor  node  of  an  IBM-SP3  at  the  North 
Carolina  Supercomputing  Center  (NCSC).  The  shared 
memory  environment  is  sufficiently  powerful  and  has 
ample  memory  for  simulations  like  those  considered  here 
[3].  Further  work  in  more  than  one  space  dimension  or 
the  use  of  these  models  for  optimization  and  design  will 
require  distributed  memory  platforms  such  as  the  SP  or  a 
Beowulf  cluster. 

The  parallel  results  we  report  in  this  paper  are  based  on 
loop-level  parallelism  using  the  OPEN-MP  programming 
environment.  In  this  mode  of  parallel  programming  the 
outermost  loops  of  the  few  most  computationally 
intensive  components  of  the  code  are  divided  between  a 
small  number  (2—16)  of  processors. 

The  critical  part  of  the  code  is  the  computation  of  the 
integral  in  Eq.  (4).  Our  approach  to  parallelism  was  to 
use  OPEN-MP  directives  to  obtain  four-way  parallelism 
of  the  outer  loop  and  to  use  calls  to  LAPACK  [6]  to 
speed  up  the  inner  integrals. 

C.  Temporal  Error  Control 

The  original  code  from  [3]  used  a  semi-implicit  form 
of  the  Euler  methods.  This  method  required  the  solution 
of  a  large  linear  system  at  each  time  step.  The  solve  was 
performed  with  a  direct  solver  for  banded  matrices.  The 
expense  of  this  linear  solve  was,  according  to  our 
execution  profiler,  nearly  90%  of  the  execution  time  and 
required  most  of  the  storage  during  the  simulation. 

We  have  replaced  this  algorithm  with  ROCK4  [7,8]  an 
advanced  Runge-Kutta  code.  ROCK4  uses  varying  orders 
and  stages  to  maximize  the  intersection  of  the  stability 
region  and  the  negative  real  axis.  In  this  way  the 
disadvantages  of  an  explicit  method  with  respect  to 
stability  are  reduced.  An  explicit  method  of  this  type 
incurs  no  linear  algebra  costs,  either  in  computation  or 
storage.  A  disadvantage  of  the  ROCK4  code  is  that  many 
more  function  evaluations  must  be  computed  at  each  time 
step  to  obtain  the  desired  accuracy  than  would  be 


necessary  with  a  standard  fourth  order  Runge-Kutta  code 
[9].  The  savings  in  linear  algebra  costs  from  the  old  code 
are  significant,  i.e.,  being  at  least  a  factor  of  five. 

A  fully  implicit  integrator  such  as  VODEPK  [10,11] 
which  uses  iterative  methods  for  the  large  linear  systems 
that  are  needed  to  compute  Newton  steps  might  be  even 
more  efficient  if  good  preconditioners  can  be  found  [12], 
The  authors  are  investigating  this  option. 

D.  Simulation  Results  and  Statistics 

In  the  studies  reported  here,  the  simulation 
performance  of  the  modified  code  was  considered  for 
three  different  types  of  RTSs.  Namely,  the  modified 
simulator  was  applied  to  the  following  basic  RTS  types, 

(i)  the  basic  double -barrier  RTS  that  originally 
demonstrated  intrinsic  oscillation  behavior  [3], 

(ii)  a  modified  double -barrier  RTS  that  utilizes 
emitter  engineering  of  the  doping  profile  to 
enhance  the  intrinsic  oscillations,  and 

(iii)  a  new  double-well  RTS  that  is  being  consider  as  a 
alternative  methodology  for  realizing  subband¬ 
coupling  induced  oscillatory  behavior. 

In  the  previous  studies,  time-domain  simulations  were 
performed  at  a  discrete  set  of  applied  bias  that  were 
slowly  sweep  forward  from  zero  to  some  maximum  value 
and  then  backwards  to  zero.  Examples  of  the  average 
current-voltage  (I-V)  characteristic  across  a  forward 
voltage  sweep  that  were  obtained  from  this  previous 
study  on  a  type-(ii )  RTS  are  given  in  Fig.  1 . 


Fig.  1  Averaged  I-V  results  from  a  type-(ii)  RTS. 


IV.  Conclusion 


At  certain  values  of  applied  bias  near  the  bottom  of  the 
negative-differential-region,  the  RTS  will  demonstrate 
intrinsic  oscillations  due  to  quantum  subband-coupling 
and  this  process  is  illustrated  in  Figure  2. 
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Fig.  2  Current  verses  time  from  a  type-(ii)  RTS. 

The  results  obtained  from  the  parallel  simulator  are  both 
qualitatively  and  quantitatively  similar  to  the  earlier 
work.  Most  importantly,  the  new  code  reduces  the  total 
simulation  times  from  days  to  hours.  Table  I  presents  a 
comparison  of  the  simulation  times  on  one  and  four 
processors  using  the  IBM  SP-3  at  the  NCSC.  The 
performance  of  the  code  was  estimated  with  a  profiler 
and  we  found  that  the  one  critical  loop  took  80-90%  of 
the  execution  time  (ET)  when  in  serial 


Table  I.  CPU  time  in  seconds. 


RTS  Type 

1  processor 

4  processors 

Efficiency 

(i) 

16784 

6515 

0.64 

(ii) 

21145 

7748 

0.68 

(iii) 

32138 

13025 

0.62 

mode.  A  single  OPEN-MP  directive  was  applied  to 
parallelize  this  same  critical  loop  and  efficiencies  of  60- 
70%  were  obtained  using  four  processors.  Here,  parallel 
efficiency  is  defined  by  the  relation: 

ET  on  One  Pr  ocessor 
ET  on  Four  Pr  ocessors  x  4 

This  is  very  good  performance  and  is  consistent  with 
almost  perfect  speedup  for  a  block  that  takes  80-85%  of 
the  CPU  time  for  the  serial  code. 


The  results  of  these  computational  studies  have 
demonstrated  that  parallel-platforms  offer  considerable 
speedup  in  the  simulation  of  instabilities  in  resonant 
tunneling  structures  (RTSs).  Hence,  this  work  suggests 
that  modern  scalable  computer  architectures  and  loop- 
level  parallelism  may  be  exploited  to  facilitate  the  future 
study  of  nanoelectronic -based  circuits  operating  at  very 
high  frequencies. 
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